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Tight Bounds for Linear Sketches of Approximate Matchings 
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Abstract 

We resolve the space complexity of linear sketches for approximating the maximum matching 
problem in dynamic graph streams where the stream may include both edge insertion and 
deletion. Specifically, we show that for any e > 0, there exists a one-pass streaming algorithm, 
which only maintains a linear sketch of size 0(n 2 ~ 3e ) bits and recovers an n e -approximate 
maximum matching in dynamic graph streams, where n is the number of vertices in the graph. 
In contrast to the extensively studied insertion-only model, to the best of our knowledge, no non¬ 
trivial single-pass streaming algorithms were previously known for approximating the maximum 
matching problem on general dynamic graph streams. 

Furthermore, we show that our upper bound is essentially tight. Namely, any linear sketch for 
approximating the maximum matching to within a factor of 0(n e ) has to be of size n 2 ^ 3e ~°^ 
bits. We establish this lower bound by analyzing the corresponding simultaneous number- 
in-hand communication model, with a combinatorial construction based on Ruzsa-Szemeredi 
graphs. 
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1 Introduction 


Massive datasets routinely arise in various application domains such as web-scale graphs and 
social networks. The space requirement for performing computations on these massive datasets 
can easily become prohibitively large. A common way of managing the space requirement is to 
consider algorithms in the streaming model of computation. In this model, formally introduced 
in the seminal work of [7], an algorithm is allowed to make a single or a few passes over the input 
while using space much smaller than the input size. We refer the reader to [5ijj for a survey of 
classical results in this model. 

In recent years, there has been extensive work on design of streaming algorithms for various 
graph problems, including connectivity, minimum spanning trees, spanners, sparsifiers, match¬ 
ings, etc. (see the survey by McGregor [34] for a summary of these results). Two types of graph 
streams are mainly studied in the literature: in the insertion-only model , the stream contains 
only edge insertion, and in the dynamic model, the stream contains both edge insertion and dele¬ 
tion. The focus of this paper is the dynamic model. The input in this model, called dynamic 
graph streams, can be defined formally as follows. 

Definition 1 ([!]). A dynamic graph stream S = (ai, 02 ,..., a t ) defines a multi-graph G(V, E) 
on n vertices V = [n]. Each ak is a triple (ik, jk, A*,) where ik,jk €E [n] and A k € {—1,+1}. 
The multiplicity of an edge ( i,j ) is defined to be: 

A{i,j)= Ak 

a k :i k =i A jk=j 


The multiplicity of every edge is required to be always non-negative. 

The streaming model where the frequency of every entry is always non-negative is standard 
for graph problems, and this model is generally referred to as the strict turnstile model in 
the literature (as opposed to the turnstile model, which allows negative frequencies also). In 
this paper, we study the maximum matching problem for dynamic graph streams in which the 
algorithm is only allowed to make a single pass over the stream. 

Matchings have received a lot of attention in the graph stream literature [4|f6]IT3 ][T5l[T8ll2T] 
I22I126I127II30II33I140| . We briefly summarize the previous results for adversarially ordered streams. 
A weaker notion of randomly ordered streams (which is less relevant to our work) is also often 
considered; for results in this model, we refer the reader to [27][30] and references therein. 

For the problem of recovering a maximum matching in bipartite graphs, a trivial lower 
bound on the space complexity of any streaming algorithm is f l(n), which is required for just 
storing the matching edges. Therefore, this problem is usually studied in the semi-streaming 
model (originally introduced by Feigenbaum et al. |18|). where the algorithm is allowed to 
use 0(n • polylog(n)) bits of space. Moreover, no exact algorithm that uses o(n 2 ) space can 
exist [ 18] . This motivates the study of a-approximate algorithms that output a matching of 
size within a multiplicative factor a of the optimum. For single-pass semi-streaming algorithms 
in the insertion-only model, the best known approximation factor is 2, which is obtained by 
simply maintaining a maximal matching during the stream. On the negative side, it is shown 
by [ 2T[[25] that any streaming algorithm that achieves an approximation factor of better than 
e/(e — 1) requires the storage of 7 i 1 + f2 ( 1 / lo s lo s n ) bits. For dynamic graph streams, to the best 
of our knowledge, no non-trivial single-pass streaming algorithm using space o(n 2 ) was known. 
Resolving the space complexity of matchings in single-pass dynamic graph streams has been 
posed as an open problem at the Bertinoro workshop on sublinear and streaming algorithms in 
2014 Q]. 

For the problem of estimating the size of a maximum matching, a strongly sublinear o(n) 
space regime has been considered. In the single-pass insertion-only model, when edges arrive 
in an adversarial order, the only known positive result for estimating the matching size is that 
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of Q7J which showed that a constant factor approximation is possible in 0(n * 2 / 3 ) space under 
the assumption that the underlying graph is planar. The same paper m also provides a lower 
bound of (resp. fl(n)) bits of space for randomized (resp. deterministic) algorithms that 

approximate the matching size in bipartite graphs to within a factor of 3/2. For the state of the 
art in the streaming model which allows multiple passes over the stream, we refer the reader 
to [31|31[nil221[21] an d references therein. 

To the best of our knowledge, the only result concerning matchings in the single-pass dy¬ 
namic graph streams is the recent paper by Chitnis et al. [12] , which provides an algorithm 
for computing a maximal matching of size k using 0{nk ) space. For multi-pass dynamic graph 
streams, [3] provides a (1 — e)-approximation scheme for the weighted non-bipartite matching 
problem using 0(p/e) passes with 0(n 1+1 ^ p ) space (see also [34]). 

Finally, closely related to our work is a recent line of work on communication complexity 
of approximate matchings in the multi-party setting [H1II31C213 ■ The one that is closest to ours 
is (24], which shows a tight bound of 0 (~r) on the total communication required to compute 
an a-approximate matching for bipartite graphs, in the k- party message passing model where 
the edges of the input graph are arbitrarily partitioned between the players. 

Linear sketches. One of the most powerful techniques for designing streaming algorithms 
is linear sketching. Let n be the number of vertices in the input graph. Then edge multiplicities 
can be treated as a vector / £ r( 2 ) with entries f e . Let A £ R dx ( 2 ) be a (possibly randomly 
chosen) matrix. Then A ■ f is referred to as a linear sketch of the input stream. If all that 
a streaming algorithm maintains is such a linear sketch, then the space requirement of the 
algorithm is proportional to d. On any incoming update (ik,jk, A*,), the linear sketch will be 
updated to A ■ f = A ■ f + A*, • A ■ 1 (i ht j k ) where /' is the new vector of edge multiplicities and 

1 u kt j k ) £ R^ 2 ) is a unit vector whose only non-zero entry is the (ik,jk) entry. At the end of the 
stream, the algorithm can apply an arbitrary function to the linear sketch to compute the final 
answer. 

Linear sketching is the only existing technique for designing streaming algorithms in the 
turnstile model and even for dynamic graph stream^ Linear sketches are also one of the main 
techniques for designing mergeable summaries [2] used in distributed computing. These facts 
have made linear sketches a computational model of their own. Multiple results are known 
about the power and limitations of linear sketches, e.g. [5US1 HUU251I28] . In fact, it is shown that 
any one-pass turnstile streaming algorithm can be implemented by maintaining only a linear 
sketch of the input during the stream |32pl . For an in-depth introduction of linear sketching and 
its applications for dynamic streams and distributed computing, we refer the reader to recent 
surveys by McGregor |34| (graph streams) and Woodruff [35] (computational linear algebra). 

1.1 Our results 

We resolve the space complexity of linear sketches for approximating maximum matchings by 
proving tight upper and lower bounds on the space requirement. For the upper bound, we 
establish the following theorem. 

Theorem 1. There is a single-pass randomized streaming algorithm that takes as input a param¬ 
eter 0 < e < 1/2 and a bipartite graph G with n vertices, specified by a dynamic graph stream, 
uses 0(n 2 ~ 3e ) bits of space, and outputs a matching of size II(opt/n e ) with high probability, 

x To the best of our knowledge the only exception is the recent paper [12), which considers a promised problem 
in dynamic graph streams. However, it is worth mentioning that for the non-promise version of the problem, the 
algorithm given in the same work can again be viewed as a linear sketching algorithm. 

2 We emphasize that the result in [32] is proven for the turnstile model rather than the strict turnstile model. 
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where opt is the size of a maximum matching in G. Moreover, the algorithm only maintains a 
linear sketch during the stream. 

We prove this result by designing a sampling based algorithm that takes advantage of the 
well-known linear sketching implementation of fo-sampler (see Section ED- The algorithm 
maintains a set of (edge) samplers that are coordinated in such a way that the sampled edges 
are “well-spread” across different parts of the graph and hence contain a relatively large matching. 
The main challenge is to achieve such a coordination for linear sketching based samplers. Such 
a coordination is typically achieved via sequential operations that depend on the state of the 
stream, while linear sketches are inherently oblivious to the underlying state. 

Note that our algorithm, though stated for bipartite graphs, also works for general graphs by 
applying the standard technique of choosing a random bipartition of the vertices upfront and only 
considering edges that cross the bipartition, while losing a factor of 2 in the approximation ratio. 
We further note that for weighted graphs with poly(n)-bounded weights, the standard “grouping 
by weight” technique can be used to obtain a similar result for computing an approximation to 
weighted matching, while losing a factor of 0(log n) in the approximation ratio. 

We complement our upper bound by the following (essentially) matching lower bound. 

Theorem 2. There exists a constant c > 0, such that for any e > 0, any randomized linear 
sketch that can be used to recover a matching of size opt/(c • n e ) for every input bipartite graph 
G on n vertices with constant probability, must have worst case space complexity of n 2 ^ 3e ~°^ 
bits. Here, opt denotes the size of a maximum matching in G. 

This result is obtained as a corollary of our lower bound on the communication complexity 
of approximating maximum matchings in the number-in-hand simultaneous model (Theorem [SJ; 
see Section [2~2l for the exact definition of this model and the connection with linear sketches. 

Our construction follows the line of work by [211(26] on using Ruzsa-Szemeredi graphs for 
proving lower bound on space complexity of streaming algorithms for maximum matching prob¬ 
lem. However, focusing on the number-in-hand simultaneous model allows us to benefit from 
different construction of Ruzsa-Szemeredi graphs that are dense, hence bypassing the limitation 
of the aforementioned works on proving lower bound for larger approximation ratios and the 
n i+fi(i/iogiogn) k arr j er on the value of the space lower bound. We elaborate more on this in 
Section 12.31 

Finally, we note that Theorem [T] and Theorem [2] provide (essentially) tight bounds on the 
space complexity of any streaming algorithm for dynamic graph streams that only maintains a 
linear sketch during the stream. This makes progress on an open problem posed at the Bertinoro 
workshop on sublinear and streaming algorithms in 2014 [1], regarding to the possibility of having 
constant factor approximation to the maximum matching in o(n 2 ) space. 

Recent related work. Independently and concurrently to our work, Konrad [29] has also 
studied the problem of designing linear sketches for approximating matchings in dynamic graph 
streams. Konrad’s work shows that an ^-approximation can be obtained using a linear sketch 
of size 0(n 2_2e ), and it establishes a lower bound of H(n 3//2_4e ) on the size of any linear sketch 
that yields an n e -approximation. Our approaches for establishing the lower bound on the sketch 
size are in the same spirit, though the techniques and constructions are quite different. 

1.2 Organization 

In Section [2] we introduce the key concepts and tools used in this paper. In particular, Sec¬ 
tion CO describes £o _ samplers and how we use them in our algorithm; Section 12.21 formally 
defines the number-in-hand simultaneous model and how it is connected to linear sketches; and 
Section 12.31 provides a definition of Ruzsa-Szemeredi graphs and the specific construction used 
in our lower bound construction. In Section [3] we describe a single-pass streaming algorithm for 
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the maximum matching problem in dynamic graph streams and prove Theorem [T] In Section [2 
we present our lower bound construction and Theorem [2] Finally, we conclude our results in 
Section [5] 

2 Preliminaries 

2.1 £ 0 -Samplers 

We use the following tool developed in the streaming literature. 

Definition 2 (.^-sampler [20) )■ Let 0 < 6 < 1 be a parameter. An l^-sampler is an algorithm 
which given access to a dynamic stream, returns FAIL with probability at most S, and otherwise, 
outputs an element e, along with the frequency f e , where e is uniformly distributed among the 
non-zero entries of the frequency vector f. 

We use fo-samplers as follows: For the input graph G(V,E), let V' C V be a subset of 
vertices; suppose we maintain an £o~ sam pl er over the stream where only the edges between 
vertices in V' are considered. At the end of the stream, we can use the £o-sampler to recover 
one edge between the vertices in V', if such an edge exists. 

We use the following lemma in our algorithm which implements fo-samplers using linear 
sketches. 

Lemma 2.1 (|25)). For any 0 < S < 1, there is a linear sketching implementation of sampler 
for the frequency vector f £ R" with probability of success 1 — <5, using 0(log 2 n ■ log (<5 -1 )) bits 
of space. 

2.2 The Number-in-Hand Simultaneous Model 

The number-in-hand simultaneous model is defined as follows. The input vector x = x\ + . ■ .+Xk 
is partitioned adversarially between k different players P*- 1 ),..., P^ k \ where each player PM 
only sees the input Xi . All players have access to an infinite shared string of random bits, referred 
to as public coins. The goal for the players is to compute a function f(x) by simultaneously 
sending a (possibly randomized using only public randomness) message to a special party called 
the coordinator , according to a pre-specihed protocol. For any input x, the coordinator is then 
required to output /( x) with probability 1 — <5 over the randomness used in the protocol. We 
refer the reader to m for more information about communication complexity in general. 

To prove our lower bound in Theorem [2j we consider the maximum matching problem in 
the number-in-hand simultaneous model, defined formally as follows. Each player pW is given 
a vector Xi £ {0, l}^ 2 ), representing the edges of a graph Gi(V,Ei), with V = [n]. Their goal is 
to approximate the maximum matching in the multi-graph G(V,E ), where E is represented by 
the vector x = x\ + ... + Xk- 

We should note that space lower bounds for single-pass streaming algorithms are usually 
obtained by proving communication complexity lower bounds in a different model of commu¬ 
nication, i.e., the one-way communication model, in which player PT speaks to P^ 2 \ who 
speaks to P^ 3 \ etc., and finally P ^ outputs the answer. In this model, the maximum matching 
problem has a simple 2-approxinration algorithm using 0(n) communication per player: send a 
maximal matching from each player to the next one. Since we are looking for space complexity 
of n 2 ~ 3e ~ 0 ^\ the one-way model cannot lead to our lower bound in Theorem [2j 

The following proposition enables us to consider the simultaneous model instead of one-way 
model in proof of our space lower bound. This reduction is well-known in the literature (see (32) , 
for example). 
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Proposition 2.2. Suppose there is a linear sketch of size s bits for a function f from which f 
can be computed with failure probability at most S; then for any k> 1, there exists a public-coin 
number-in-hand simultaneous protocol for k players to compute f, where each player commu¬ 
nicates a message of size s and the coordinator is able to compute f with failure probability at 
most 6. 

Proof. The players use the public coins to construct the set of random coin tosses required to 
create the matrix A in the linear sketch. Then, each player computes A ■ Xi and sends it to the 
coordinator. The coordinator can now compute A - x for x = X\ + ... + Xk by simply computing 
A ■ x = A ■ xi + ... + A ■ Xfc, and then compute fix) from A ■ x. 

2.3 Ruzsa-Szemeredi graphs 

Given an undirected graph G[V,E ) and a set of edges F C E, we denote by V(F), the set of 
vertices which are incident on at least one edge in F. Moreover, we denote by E(F), the set of 
edges induced by F, i.e. E D {V(F) x V(F)). F is said to be an induced matching if no two 
edges in F share an endpoint and E{F) = F. 

Definition 3 (Ruzsa-Szemeredi graph). We call a graph G an (r, t) -Ruzsa-Szemeredi graph, 
(r, t)-RS graph for short, if the set of edges in G consists oft pairwise disjoint induced matchings 
Mi ,..., M t , each of size r. 

In general, graphs of this type are of interest when r and t are relatively large as a function of 
number of vertices in the graph. The first construction of an (r, f)-RS graph was given by Ruzsa 
and Szemeredi m with parameters r = ocAr) an d t = §. By now, there are several known 
construction of these graphs with different range of parameters r and t P ITTI US] (see 0 for 
more information). In particular, Fischer et al. [f|| introduced a construction with parameters 
r = (1 — o(l)) • ^ and t = ri^ 1 / 105 ' 0 ®"). This construction was further used and improved 
by [2UE2S1 to obtain their aforementioned lower bound of 1 °s lo s jl ) on space complexity 

of streaming algorithms for maximum matching problem in the insertion-only streams. 

We use the construction of (r, t)-RS graphs given by Alon et al. 0 , which is summarized in 
the following theorem. 

Theorem 3 (0). For any sufficiently larqe N, there exists an (r,t)-RS qraph on N vertices 
with r = TV 1 -^ 1 ) and r ■ t = (^) - o(N 2 ). 

3 An O(n e )-approximation using 0(n 2 3e ) space 

In this section, we present our algorithm for computing an approximate maximum matching in 
the dynamic graph streams and prove the following theorem. 

Theorem 4. There is a single-pass randomized streaming algorithm that takes as input a pa¬ 
rameter 0 < e < 1/2 and a bipartite graph G with n vertices specified by a dynamic graph stream, 
uses 0(n 2 ~ 3e • polylog(n)) bits of space, and with high probability, outputs a matching of size 
fl(opt /n e ), where opt is the size of a maximum matching in G. 

In the following, whenever we use £o _sam pl ers ; we always apply Lemma 12. 1 1 with parameter 
S = n 5 . Since the number of £o _sam plers used by our algorithm is bounded by 0(n 2 ), with high 
probability, none of them will fail. In the rest of this section, we always assume this is the case 
for all to-samplers we use, and we do not explicitly account for the probability of .^-samplers 
failure in our proofs. 

For simplicity, we assume that the algorithm is provided with a value opt that is a 2- 
approximation of opt i.e., the size of a maximum matching in G. This is without loss of 
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generality, since we can run our algorithm for O(logn) different estimates of opt in parallel 
and output the largest matching among the matchings found for all estimates. In addition, we 
can assume opt > rf , since otherwise a single edge is an n e -approximation of the maximum 
matching, which can be obtained by maintaining an ^o-sampler over all edges in the graph. 


Algorithm 1 A single-pass dynamic streaming algorithm for the maximum matching problem. 


• Input: A bipartite graph G(L, R, E) with n vertices on each side, specified by a dynamic 
graph stream, a parameter 0 < e < 1 / 2 , and a 2 -approximation to the size of a maximum 
matching in G as opt. 

• Output: A matching M with size fl(opt/n e ). 

• Pre-processing: 


1. Let a = 


opt 

n e 


/3 = 6 


opt 

r)2e 


■ log n, and 7 = 4n e . 


2. Create two collections £ and TZ , each containing a sets (called groups). Create two 'y-wise 
independent hash functions h l : L 1 —> £ and hn : R 1 —> 1Z. Assign each vertex u £ L (resp. 
v £ R) to the group h l{u) £ £ (resp. hn(v) £ 1Z). 


3. For each Li £ £, assign /? groups in 1Z to L,; chosen independently and uniformly at random 
with replacement. For each Rj assigned to Lj, we say Rj is an active partner of Li and 
(Li,Rj) form an active pair. 


• Streaming updates: 


* For each L,; £ £ and each of its active partners Rj £ 7 Z, maintain an .^-sampler over the 
edges between the vertices in and Rj. 

• Post-processing: 


* Sample one edge from each maintained £ 0 _ sampler and compute a maximum matching M 
over the sampled edges. 


The space complexity of Algorithm 1 is easy to verify. The algorithm stores two 7 -wise 
independent hash functions h l and h }j to assign vertices to their groups, which requires 0( 7 ) = 
0(n e ) bits of space |35| . 0(a ■ 0) truly random bits are needed for identifying the active 

partners of each group in £, and 0(a ■ f3) £cr sam pl er s are maintained for the active pairs during 
the stream, where each of them requires 0(log 3 n) bits of space fLemma l2.ll) . Hence, the total 
space complexity of the algorithm is: 


0 (n e + a ■ 1 3) = 0{n e + 


opt 2 

n 3e 


) = d(n 2 ~ 3e ) 


where the last equality is by choice of e < 1 / 2 . 

We now prove the correctness of the algorithm. Fix a maximum matching M* in G with 
size opt. The following concentration bound ensures that each group in £ and 1Z contains 
(1 ± 0.001)n e vertices of the maximum matching M*. 

Claim 3.1 ([38])■ If X is sum of k-wise independent random variables taking values in [0,1], 
and fj. = E[X], then: 


Pr(|X — p\ > e'fi) < exp(—(fc/2j) 


Ve' < 1 ,k< 


e V ~ 1/3 
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For simplicity, in the following, we assume every group has exactly n e vertices of |. For 
any group Li £ £, (resp. Rj £ TV) we refer to the edges in M* that are incident on Li (resp. 
Rj) as the matching edges of this group. Since M* is a matching, the number of matching edges 
of each group is also n e . 

We say a ( Li,Rj ) pair is matchable by M* if Li and Rj share at least one matching edge. 
The general idea of the proof is to show that among all ( Li,Rj) active pairs, there is a subset 
M C £ x ft of fl(opt/n e ) active pairs with the following two properties: 

(i) Each pair is matchable by M*. 

(ii) No two pairs in M share the same endpoints Li or Rj. 

Intuitively, properties (liliil) together ensures that there exists a “matching” between the 
groups in £ and ft of size P(opt/n e ). Since we maintain an t'o-sampler for each active pair in A4, 
and each matchable active pair contains at least one edge in G, the £o-samplers for the match- 
able active pairs will return |.M| edges, which will form a matching of size |A4| = fi(opt/n e ) in 
graph G. 

To prove the existence of such a set At, we start by arguing that there are fl(opt/n e ) groups 
Li in £ such that (essentially) Ll(n e ) matching edges of Li are incident on fl (n e ) distinct groups 
ft. Consequently, when the algorithm randomly assigns Li with /3 = f l(a/n e ) groups in ft, since 
|ft| = a, with high probability, at least one of the ( L t ,Rj) active pairs is matchable by M*. 
This ensures that we have P(opt/n e ) (Li, Rj) matchable active pairs where all Lf s are distinct. 
Finally, we show that a constant fraction of these matchable active pairs also have distinct Rj'' s, 
with a constant probability, proving property (JITJ). 

We now provide the formal proof. To continue, we need the following definitions. We say 
a group Li £ £ is spanning if the matching edges of Li are incident on at least min{ro e ,a}/3 
different Rj £ ft. We say that Li preserves an edge in M* if Li belongs to at least one matchable 
active pair. 

Lemma 3.2. With probability at least 1 — 1 /n, every spanning Li £ £ preserves an edge in M*. 


Proof. We argue that if Li is spanning, then Li preserves an edge in M* with probability at 
least 1 — 1/n 2 . Then, by applying union bound over all spanning Li, with probability (1 — 1 /n), 
every spanning Li preserves an edge in M*. 

For any spanning Li, there are min{n e ,a}/3 different Rj's such that M* contains an edge 
between Li and Rj, i.e., (Li,Rj) is matchable by M*. Recall that Li is assigned with /3 = 
(6a log n) /n € groups in ft uniformly at random. 

If a/3 different Rj's are matchable with Li by M*, assigning 2 log ri random groups in ft to 
Li suffices to ensure that with probability at least 1 — 1/n 2 * , Li preserves an edge in M*. 

If n e /3 different Rj's are matchable with Li by M*, the probability that a spanning Li does 
not preserve any edge in M* is at most 


n .a , n 

d - ^ < «p(-- 


P) = exp(- 


n e 6a log n 
3a n e 


)< 


I 

Lemma 3.3. With a constant probability, at least 1/4 of the Li’s are spanning. 

Proof. We use the following simple balls and bins argument (see Appendix lA.il for a proof). 

Claim 3.4. Suppose we assign x balls to y bins independently and uniformly at random. With 
probability at least 1/2, the number of non-empty bins is at least minja:, y}/3. 

3 One can simply substitute c £ [0.999n e , l.OOln 6 ] in following equations instead of n e and obtain the same result 

with a slight change in the constants. 
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Fix an Lj £ C. Consider each Rj £ 1Z as a bin and each matching edge of Li as a ball. An 
edge (u, v ) (v £ R), i.e., a ball, is assigned to the bin Rj iff the group assigned to vertex v is Rj. 
The number of balls here is n e and since we use a 7 -wise independent hash function (7 > rf) 
to assign the balls to the bins, all these n e balls are assigned independently. By Claim lT4l at 
least min{n e ,a}/3 different Rj' s have edges in M* that are incident on Li and Rj (hence Li is 
spanning), with probability at least 1/2. By Markov inequality, with a constant probability, at 
least a/4 Lj’s are spanning. 

Lemma 3.5. With a constant probability, fl(a) groups in 1Z are active partners of distinct 
spanning Li £ 1Z, such that Li and Rj are matchable by M*. 

Proof. Suppose each spanning Li, when picking the Rj’s, only keeps the first Rj where Li 
and Rj are matchable by M* (picking more can only increase the size of the final matching). 
We only need to show that the number of distinct Rj's that are kept by Lf s is S2(a). 

Suppose n e < a; the other case when n e > a is an easy case since each spanning Li is 
matchable with 1/3 fraction of the groups in 7 Z. By Lemma [3.31 there are a/4 spanning Lj’s 
with high probability. Therefore, there are (n e /3) • (a/4) = {rf / 3) • (opt/(4n e )) = (opt/12) 
edges in M* incident on all the spanning Lj’s; we denote these opt/12 edges of M* by M'. 
Since each group in 1Z has n e matching edge, and a = opt/n e , it must be that at least (a/24) 
groups in 7 Z contain at least {n e / 23) vertices incident on M'; otherwise, the total number of 
edges incident on M' is less than 

a e 23 n e opt opt opt 
24 " U + 24’23 _ ~24 _ + ~24 _ ~ _ L2~ 

Let TZ' be the set of all these (a/24) groups in 1Z. Conditioned on the event that Lj preserves 
an edge in M*, for each of the Rj groups that are matchable with Lj by M* (there are at most 
n e such groups), the probability that Rj is kept by Lj is at least l/n e . Therefore, for each of 
these (a/24) Rj groups, the probability that Rj is assigned to any spanning Lj is at most 

(1 _ jL)» 723 < exp(- —) = e " 1 / 23 
v n e> ~ v 23n £ ' 

Hence the expected number of groups in 7 Z! that are not active partner of any spanning Lj is 
at most S2(a). By Markov inequality, with a constant probability, f2(a) different Rj £ 7 Z! will 
be kept by some spanning Lj. 

Note that the probability of success can be boosted to any constant by allowing Lj to 
repeatedly pick /3 groups from 7 Z as active partners for a constant number of times. 

Proof. (Theorem HI) By Lemma 13.51 f2(a) groups in 7 Z will be assigned to H(a) distinct 
spanning groups in C; moreover, every such pairs are matchable by M*. Since all these pairs 
are matchable by M*, there exists at least one edge between each of these pairs. By picking 
one edge for each of these pairs, using the £o _ sampler between these (Lj,i?j) active pairs, we 
obtain a matching of size H(a). Therefore, in the post-processing step, the algorithm can find 
a matching of size f 1 (a) = fl(opt/n e ). 


4 An n 2 3e 0( ^ r) lower bound for O(n e )-approximation 

In this section, we provide our lower bound result for approximating the maximum matching 
using linear sketches. As stated in Section 12.21 we only need to prove the lower bound for the 
number-in-hand simultaneous model; the rest follows from Proposition 12.21 



Theorem 5. There exists a constant c > 0, such that for any e > 0, any protocol for approxi¬ 
mating the maximum matching to within a factor of c-n e on every graph G with n vertices, in the 
number-in-hand simultaneous model with k = n t+0 ^ players, has to communicate n 2 ^ 3e_ °( 1 ) 
bits from at least one player. 

Note that though we state Theorem [5] for general graphs, the reduction mentioned after 
Theorem |T] implies the same lower bound for bipartite graphs. 

By Yao’s principle, it is enough to prove the lower bound on the communication complexity 
of deterministic protocols on some fixed distribution on the inputs (known to the players). We 
provide the following distribution as a hard input distribution for every deterministic protocol. 


The hard input distribution (for any e > 0 and any sufficiently large integer N) 


• Parameters: r, t, k, n , a: 


= N 1 - 0 ^ t = 


( 2 :) — o(N 2 ) k= (N 1+ 




1/(1 — e ) 


= k ■ N a = n e 


• For each player P M (i £ [fc]) independently, 

1 . Create a set of N vertices V. and construct an arbitrary (r, t)-RS graph over Vi. 

2. Pick A £ [i] uniformly at random and let V* be the set of vertices matched in the 
induced matching M\. 

3. For each of the t induced matchings, drop half of the edges uniformly at random. 

• Pick a random permutation 7 r of [n]. For every player pW, let the label of Vj to be 7r(j) 
for every Vj £ Vi \ V* and let the label of Uj to be tt(N + (i — 2) ■ r + j) for Uj £ V*. Note 
that the vertices with the same label correspond to the same vertex in the final graph. 


Several remarks are in order. First, one can easily verify the following relation between the 
parameters, 

k = aN/r = n e iV o(1) = n e+o(1) 

Second, for the choice of the parameters r, t, and N, by Theorem[3l such an (r, f)-RS graph with 
N vertices indeed exists. Moreover, note that the vertices in V* for all players are assigned with 
unique labels, while the vertices in V.\V* are assigned with the same set of labels. Consequently, 
the final graph is a multi-graph with n vertices and 0{kN 2 ) = 0(n 2 ~ e ~ 0 ^) total number of 
edges (counting the multiplicties). We now briefly describe the intuition behind this distribution. 

Each player pM is given an (r, f)-RS graph with half of the edges discarded uniformly at 
random from each of the t induced matchings. Moreover, only a single induced matching is 
“private” and the vertices that are not incident on this matching are shared among all players. 
In addition, the identities of the private matching and shared vertices are unknown to the 
players. Intuitively, for any deterministic protocol over this distribution, every player has to 
send enough information for the coordinator to recover a large fraction of the edges from every 
induced matching; otherwise, the coordinator will not be guaranteed to recover a large enough 
matching. We now make this intuition formal. 

We say a vertex v £ V is good if it belongs to some V* for i £ [k]. We say a matching M is 
trivial if the total number of good vertices matched in M is at most N. 
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Claim 4.1. Let M* be a maximum matching in G and M be any trivial matching, then 

\M\ 4 

]M*[ “ a 

Proof. Since M* is a maximum matching, it contains at least Gf edges (just using the induced 
matching between the good vertices of each player). On the other hand, since M is a trivial 
matching, its size is at most the number of vertices shared by all players plus the number of 
good vertices matched in M, which is at most 2N. Since k = aN/r, 

\M\ 2 TV 2 N 4 

\M*\ ~ k ■ r/2 — aN/2 ~ a 


I 

Our goal is to prove that in any protocol that each player transmits a “small-size” message, 
the expected number of good vertices matched by the final matching is small. In other words, 
the coordinator would only be able to recover a trivial matching. 

Recall that Gi(V, F,) is the graph given to the player pW and V =[ri\. With a slight abuse 
of notation, we refer to the induced subgraph of Gi that is obtained by removing all isolated 
vertices as the graph Gi itself, since, this graph is effectively the real input to the player pM. 
Moreover, note that picking the permutation n ensures that the labels of the vertices in Gi are 
chosen uniformly at random from [n] and hence revealing no extra information to the player 
PW. Let Qi be the set of all possible graphs that Gi can be. Since the edges of Gi are obtained 
through dropping half of the edges uniformly at random from each induced matching of an 
(r,t)-RS graph, \Qi\ = (r) . Moreover, in the input distribution, Gi is chosen from Qi uniformly 
at random. 

For any subset F C Q iy we define the graph Gf as the intersection graph of all graphs in F , 
i.e., an edge belongs to the graph Gf iff it belongs to every graph in F. 

Lemma 4.2. For any i £ [fc], any subset F C Q it and any integer (3 > 0, let Ip C [t] be the 
set of indices such that for any j £ Ip, Gf contains at least edges from the j-th induced 
matching; if |F| > 2 (_ 4 °-i°«™ ) \Qi\, then \Ip\ < 2 0 +2 \ ogn ■ 

Proof. Let 7 = \Ip\; we can upper bound the size of F as follows: 



Therefore, 7 > 2 t l +Ho’rn i nl pli es |P| < 2^ 4 «'‘oe") \Qi\] a contradiction. 

Lemma 4.3. Suppose for each i £ [fc], the player pW sends a message of size at most 

r ■ t 

s = -^- 

5a • log n 

bits to the coordinator; then, the expected number of good vertices that are matched in the 
matching computed by the coordinator is at most N/2. 
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Proof. Fix an index i £ [k] and a player pW. Let Xi denote the random variable counting 
the number of good vertices that are matched by the coordinator from the graph provided to 
the player pM. In the following, we prove that 


S 2a 


( 1 ) 


Having this, for X = Xi, by linearity of expectation, we have E[X] < kr/2a = N/2, 

implying that the expected number of good vertices matched by the coordinator is at most N/2. 

Suppose the coordinator knows all inputs to the players except for player pW, i.e., the graph 
Gi . Note that this is the maximum information the coordinator can obtain from other players. 
Define fa : Qi i—>■ {0,1} S as the deterministic mapping used by the player pW to map the input 
graph to a s-bit message and send it to the coordinator. Define the function Tj : Qi H > 2® i such 
that for any G £ Qi, Ti(G) = {H £ Qi \ fa(G) = fa{H)}. 

The important observation is that since the protocol is deterministic, the coordinator can 
output an edge e £ Gi as a matching edge for the player P^, only if e is part of every graph in 
Ti(Gi). We define fa to be the event that for the graph Gi, |r*(Gi)| < 2* \Qi\. 

The following claim can be proven using a simple counting argument (see Appendix IA.2I for 
a proof). 

Claim 4.4. For any i £ [k\, Pr (fa) < . 

We can write the expected value of Xi as, 

E[X t ] = E[Xi | Si] • Pr (fa) + E[Xi \ fa] • (1 - Pr(^)) (2) 


By Claim I4~4l the first term in this equation is less than 1. For simplicity, we neglect this 
additive value of 1 in Equation (0. We now bound the second term. We have: 


log n 


E[Xi 


£] = Pr(*i = j I fc) < £ 

3 =1 


2^+ 1 r 


Pr(A, > 




fa) 


(3) 


= 1 


We can now compute Pr(X^ > 'fafa \ £,) for any j3 > 0 as follows. Let F = T^Gi); the event 
fa implies that |P| > 2 < '^ 4q1 °*"' 1 \Qi\. By Lemma 14.21 for fa defined as in the lemma statement, 
\fa | < 2 ^ +2 f Iog — . In the input distribution, A is chosen from [t] uniformly at random. Therefore, 
the probability that X £ fa is at most 2 $ + z i og — ■ Hence, 




(4) 


By plugging in inequality 0 in @ we obtain, 

logn 2 p +lr 


log n 


E [Xi | fat] < J2 


a 2^+ 2 log i 


= E 


2a log n 2a 


Consequently, we proved the inequality 0, i.e, E[X,] < -fa. 


Proof. (Theorem 0 By Lemma [4.31 if no player communicates a message of size P( a .[ 0 * n ) 
bits, then the expected number of good vertices matched in the matching output by the co¬ 
ordinator is IV/2 and hence by Markov bound the output matching is a trivial matching with 
probability 1/2. By Claim R~T1 any trivial matching is at most an (a/4)-approximation to the 
maximum matching. 

Since a = n e , k = n e+< 4 1 \ N = n/k, and r ■ t = £l(N 2 ) (by Theorem 0, we have that 
any simultaneous protocol that obtains a better than (n e /4)-approximation to the maximum 
matching with constant probability, has to communicate n 2 ~ 3e ~°^ bits from at least one player. 
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5 Conclusions 


In this paper, we resolved the space complexity of linear sketches for approximating the maxi¬ 
mum matching problem in dynamic graph streams. In particular, for approximating the maxi¬ 
mum matching to within a factor of n e , we proved that the space of n 2_3e±o( - 1 - ) bits is sufficient 
and necessary for every single-pass streaming algorithm that only maintains a linear sketch of 
the stream. 

Our result suggests that to achieve better upper bound for the maximum matching problem, 
a new set of techniques is required. Alternatively, it might be the case that any algorithm 
for dynamic graph streams can be implemented as a linear sketch (similar to the equivalence 
between linear sketches and single-pass turnstile algorithms [32]). As noted earlier, to the best 
of our knowledge, every known single-pass streaming algorithm for the general dynamic graph 
streams is indeed of this form (i.e., only maintains a linear sketch). In that case, our bounds 
would characterize the power of any single-pass streaming algorithm for the maximum matching 
problem in dynamic graph streams. 

Acknowledgment s 

We would like to thank Michael Kapralov and David Woodruff for helpful discussions. 


References 

[1] Bertinoro workshop 2014, problem 64. http: //sublinear. inf o/index .php?title=Open_Problems : 64. 
Accessed: 2015-05-1. 

[2] Agarwal, P. K., Cormode, G., Huang, Z., Phillips, J. M., Wei, Z., and Yi, K. 

Mergeable summaries. ACM Trans. Database Syst. 38, 4 (2013), 26. 

[3] Ahn, K. J., AND Guha, S. Access to data and number of iterations: Dual primal algo¬ 
rithms for maximum matching under resource constraints. CoRR abs/1307.4359 (2013). 

[4] Ahn, K. J., AND Guha, S. Linear programming in the semi-streaming model with appli¬ 
cation to the maximum matching problem. Inf. Comput. 222 (2013), 59-79. 

[5] Ahn, K. J., Guha, S., AND McGregor, A. Analyzing graph structure via linear mea¬ 
surements. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete 
Algorithms (2012), SODA ’12, SIAM, pp. 459-467. 

[6 ] Ahn, K. J., Guha, S., and McGregor, A. Graph sketches: sparsification, spanners, 
and subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium 
on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012 
(2012), pp. 5-14. 

[7] Alon, N., Matias, Y., AND Szegedy, M. The space complexity of approximating the 
frequency moments. In STOC (1996), ACM, pp. 20-29. 

[8] Alon, N., Moitra, A., AND Sudakov, B. Nearly complete graphs decomposable into 
large induced matchings and their applications. In Proceedings of the 44th Symposium on 
Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012 
(2012), pp. 1079-1090. 

[9] Alon, N., Nisan, N., Raz, R., and Weinstein, O. Welfare maximization with limited 
interaction. Electronic Colloquium on Computational Complexity (ECCC) 22 (2015), 54. 

[10] Andoni, A., Nguyen, H. L., Polyanskiy, Y., and Wu, Y. Tight lower bound for 
linear sketches of moments. In Automata, Languages, and Programming - 40th Interna¬ 
tional Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I (2013), 
pp. 25-32. 


12 


[11] Birk, Y., Linial, N., and Meshulam, R. On the uniform-traffic capacity of single¬ 
hop interconnections employing shared directional multichannels. IEEE Transactions on 
Information Theory 39, 1 (1993), 186-191. 

[12] Chitnis, R. H., Cormode, G., Hajiaghayi, M. T., AND Monemizadeh, M. Parame¬ 
terized streaming: Maximal matching and vertex cover. In Proceedings of the Twenty-Sixth 
Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, 
USA, January 4~6, 2015 (2015), pp. 1234-1251. 

[13] CROUCH, M., AND Stubbs, D. S. Improved streaming algorithms for weighted matching, 
via unweighted matching. In Approximation, Randomization, and Combinatorial Opti¬ 
mization. Algorithms and Techniques, APPROX/RANDOM 2014, September 4~6, 2014, 
Barcelona, Spain (2014), pp. 96-104. 

[14] Dobzinski, S., Nisan, N., and Oren, S. Economic efficiency requires interaction. In 
Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 
03, 2014 (2014), pp. 233-242. 

[15] Eggert, S., Kliemann, L., and Srivastav, A. Bipartite graph matchings in the semi¬ 
streaming model. In Algorithms - ESA 2009, 17th Annual European Symposium, Copen¬ 
hagen, Denmark, September 7-9, 2009. Proceedings (2009), pp. 492-503. 

[16] Epstein, L., Levin, A., Mestre, J., AND Segev, D. Improved approximation guar¬ 
antees for weighted matching in the semi-streaming model. SIAM J. Discrete Math. 25, 3 
(2011), 1251-1265. 

[17] Esfandiari, H., Hajiaghayi, M. T., Liaghat, V., Monemizadeh, M., and Onak, 
K. Streaming algorithms for estimating the matching size in planar graphs and beyond. In 
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 
SODA 2015, San Diego, CA, USA, January 4~6, 2015 (2015), pp. 1217-1233. 

[18] Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J. On graph 
problems in a semi-streaming model. Theor. Comput. Sci. 348, 2-3 (2005), 207-216. 

[19] Fischer, E., Lehman, E., Newman, I., Raskhodnikova, S., Rubinfeld, R., and 
Samorodnitsky, A. Monotonicity testing over general poset domains. In Proceedings 
on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montreal, 
Quebec, Canada (2002), pp. 474-483. 

[20] Frahling, G., Indyk, P., and Sohler, C. Sampling in dynamic data streams and 
applications. International Journal of Computational Geometry & Applications 18, 01n02 
(2008), 3-28. 

[21] Goel, A., Kapralov, M., AND Khanna, S. On the communication and streaming 
complexity of maximum bipartite matching. In Proceedings of the Twenty-third Annual 
ACM-SIAM Symposium on Discrete Algorithms (2012), SODA ’12, SIAM, pp. 468-485. 

[22] GURUSWAMI, V., AND Onak, K. Superlinear lower bounds for multipass graph processing. 
In Proceedings of the 28th Conference on Computational Complexity, CCC 2013, K.lo Alto, 
California, USA, 5-7 June, 2013 (2013), pp. 287-298. 

[23] Hardt, M., and Woodruff, D. P. How robust are linear sketches to adaptive inputs? 
In Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 
1-4, 2013 (2013), pp. 121-130. 

[24] Huang, Z., Radunovic, B., Vojnovic, M., and Zhang, Q. Communication com¬ 
plexity of approximate matching in distributed graphs. In 32nd International Symposium 
on Theoretical Aspects of Computer Science, STACS 2015, March 4~5, 2015, Garching, 
Germany (2015), pp. 460-473. 


13 


[25] Jowhari, H., Saglam, M., AND Tardos, G. Tight bounds for lp samplers, finding 
duplicates in streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD- 
SIGACT-SIGART symposium on Principles of database systems (2011), ACM, pp. 49-58. 

[26] Kapralov, M. Better bounds for matchings in the streaming model. In Proceedings of 
the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, 
New Orleans, Louisiana, USA, January 6-8, 2013 (2013), pp. 1679-1697. 

[27] Kapralov, M., Khanna, S., and Sudan, M. Approximating matching size from random 
streams. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete 
Algorithms, SODA 2014, Portland, Oregon, USA, January 5-1, 2014 (2014), pp. 734 751. 

[28] Kapralov, M., Lee, Y. T., Musco, C., Musco, C., and Sidford, A. Single pass 
spectral sparsification in dynamic streams. In 55th IEEE Annual Symposium on Foun¬ 
dations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014 
(2014), pp. 561-570. 

[29] Konrad, C. Maximum matching in turnstile streams. Manuscript, May, 2015. 

[30] Konrad, C., Magniez, F., and Mathieu, C. Maximum matching in semi-streaming 
with few passes. In Approximation, Randomization, and Combinatorial Optimization. Al¬ 
gorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th Interna¬ 
tional Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings 
(2012), pp. 231-242. 

[31] Kushilevitz, E., and Nisan, N. Communication complexity. Cambridge University 
Press, 1997. 

[32] Li, Y., Nguyen, H. L., and Woodruff, D. P. Turnstile streaming algorithms might 
as well be linear sketches. In Symposium on Theory of Computing, STOC 2014, New York, 
NY, USA, May 31 - June 03, 2014 (2014), pp. 174-183. 

[33] McGregor, A. Finding graph matchings in data streams. In Approximation, Randomiza¬ 
tion and Combinatorial Optimization, Algorithms and Techniques, 8th International Work¬ 
shop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 
2005 and 9th InternationalWorkshop on Randomization and Computation, RANDOM2005, 
Berkeley, CA, USA, August 22-24, 2005, Proceedings (2005), pp. 170-181. 

[34] McGregor, A. Graph stream algorithms: a survey. SIGMOD Record 43, 1 (2014), 9-20. 

[35] Motwani, R., and Raghavan, P. Randomized Algorithms. Cambridge University Press, 
1995. 

[36] Muthukrishnan, S. Data streams: Algorithms and applications. Foundations and Trends 
in Theoretical Computer Science 1, 2 (2005). 

[37] Ruzsa, I. Z., AND Szemeredi, E. Triple systems with no six points carrying three 
triangles. Combinatorics (Keszthely, 1976), Coll. Math. Soc. J. Bolyai 18 (1978), 939-945. 

[38] Schmidt, J. P., Siegel, A., AND Srinivasan, A. Chernoff-hoeffding bounds for appli¬ 
cations with limited independence. SIAM J. Discrete Math. 8, 2 (1995), 223-250. 

[39] Woodruff, D. P. Sketching as a tool for numerical linear algebra. Foundations and 
Trends in Theoretical Computer Science 10, 1-2 (2014), 1 157. 

[40] Zelke, M. Weighted matching in the semi-streaming model. Algorithmica 62, 1-2 (2012), 
1 - 20 . 


14 


A Omitted Proofs 

A.l Omitted proofs from Lemma 13.31 

Claim. Suppose we assign x balls to y bins independently and uniformly at random. With 
probability at least 1/2, the number of non-empty bins is at least min{a;,y}/3. 

Proof. For each bin, the probability that the bin is empty is at most, 

(1- -) x < e~y 
V 

We consider two cases. If x/y > 1.5, 


— - „ — 1 s 1 

e y < e b < - 
4 

Hence the expected number of empty bins is at most y/ 4, and by Markov inequality, with 
probability at least 1 / 2 , the number of empty bins is at most y/2. 

If x/y < 1.5, since e~ z < 1 — z/2 for z £ [0,1.5], 


Hence the probability that a bin is non-empty is at least x/(2y), and the expected number of 
non-empty bins is at least x/2. Since a bin being non-empty is negatively correlated with other 
bins being non-empty, by the extended Chernoff bound, with probability at least 1 — e~ n ( x \ the 
number of non-empty bins is at least x/3. 

Hence over all, the number of non-empty bins is at least min{a;,y}/3 with probability at 
least 1 / 2 . | 

A.2 Omitted proofs from Lemma 14.31 

Claim. For any i £ [k], Pr(fj) < 

Proof. Let o £ {0,1} S be the output of the function <fii, and with slight abuse of notation, we 
let r,(o) = Fj(G) for some G such that (f>i(G) = o. We say o is light iff |r^(o)| < 2*'” 4 “ r io g n) | g^ m 
We have 


Pr {Si)= Y, Pr G~gMG) 

oG{ 0,1} s : o is light 


E 

o(E{0,l} s : o is light 


|Pj(o)| 

\Qi\ 


2 S 4a-log n 


o) 


I 
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